Proximal policy optimization via enhanced exploration efficiency
نویسندگان
چکیده
Proximal policy optimization (PPO) algorithm is a deep reinforcement learning with outstanding performance, especially in continuous control tasks. But the performance of this method still affected by its exploration ability. Based on tasks, paper analyzes original Gaussian action mechanism PPO algorithm, and clarifies influence ability performance. Afterward, aiming at problem exploration, an enhancement based uncertainty estimation designed paper. Then, we apply theory to propose proximal intrinsic module (IEM-PPO). In experimental parts, evaluate our multiple tasks MuJoCo phsysical simulator, compare IEM-PPO curiosity (ICM-PPO). The results demonstrate that performs better terms sample efficiency cumulative reward, has stability robustness.
منابع مشابه
Proximal Policy Optimization Algorithms
We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a “surrogate” objective function using stochastic gradient ascent. Whereas standard policy gradient methods perform one gradient update per data sample, we propose a novel objective function that enables multiple epochs of ...
متن کاملEnhanced Delta-tolling: Traffic Optimization via Policy Gradient Reinforcement Learning
The prospect of widespread deployment of autonomous vehicles invites the reimagining of the multiagent systems protocols that govern traffic flow in our cities. One such possibility is the introduction of micro-tolling for fine-grained traffic flow optimization. In the micro-tolling paradigm, different toll values are assigned to different links within a congestable traffic network. Self-intere...
متن کاملDerivative-Free Optimization Via Proximal Point Methods
Derivative-Free Optimization (DFO) examines the challenge of minimizing (or maximizing) a function without explicit use of derivative information. Many standard techniques in DFO are based on using model functions to approximate the objective function, and then applying classic optimization methods on the model function. For example, the details behind adapting steepest descent, conjugate gradi...
متن کاملVariational Policy Search via Trajectory Optimization
In order to learn effective control policies for dynamical systems, policy search methods must be able to discover successful executions of the desired task. While random exploration can work well in simple domains, complex and highdimensional tasks present a serious challenge, particularly when combined with high-dimensional policies that make parameter-space exploration infeasible. We present...
متن کاملIdentifying Reusable Macros for Efficient Exploration via Policy Compression
Reinforcement Learning agents often need to solve not a single task, but several tasks pertaining to a same domain; in particular, each task corresponds to an MDP drawn from a family of related MDPs (a domain). An agent learning in this setting should be able exploit policies it has learned in the past, for a given set of sample tasks, in order to more rapidly acquire policies for novel tasks. ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Information Sciences
سال: 2022
ISSN: ['0020-0255', '1872-6291']
DOI: https://doi.org/10.1016/j.ins.2022.07.111